Search CORE

343 research outputs found

From MultiJEDI to MOUSSE: Two ERC Projects for innovating multilingual disambiguation and semantic parsing of text

Author: Basile Valerio
Navigli Roberto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

In this paper we present two interrelated projects funded by the European Research Council (ERC) aimed at addressing and over- coming the current limits of lexical semantics: MultiJEDI (Section 2) and MOUSSE (Section 4). We also present the results of Babelscape (Section 3), a Sapienza spin-off company with the goal of making the project outcomes sustainable in the long ter

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Two knowledge-based methods for High-Performance Sense Distribution Learning

Author: Navigli Roberto
Pasini Tommaso
Publication venue
Publication date: 01/01/2018
Field of study

Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at http://trainomatic.org

Archivio della ricerca- Università di Roma La Sapienza

Large-Scale information extraction from textual definitions through deep syntactic and semantic analysis

Author: DELLI BOVI CLAUDIO
NAVIGLI ROBERTO
Telesca Luca
Publication venue
Publication date: 01/01/2015
Field of study

We present DEFIE, an approach to large-scale Information Extraction (IE) based on a syntactic-semantic analysis of textual definitions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically. The output of DEFIE is a high-quality knowledge base consisting of several million automatically acquired semantic relations

CiteSeerX

Archivio della ricerca- Università di Roma La Sapienza

VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling

Author: andrea di fabio
CONIA SIMONE
roberto navigli
Publication venue
Publication date: 01/01/2019
Field of study

We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org

Crossref

Archivio della ricerca- Università di Roma La Sapienza

MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)

Author: Navigli Roberto
Tedeschi Simone
Publication venue
Publication date: 01/01/2022
Field of study

Named Entity Recognition (NER) is the task of identifying named entities in texts and classifying them through specific semantic categories, a process which is crucial for a wide range of NLP applications. Current datasets for NER focus mainly on coarse-grained entity types, tend to consider a single textual genre and to cover a narrow set of languages, thus limiting the general applicability of NER systems.In this work, we design a new methodology for automatically producing NER annotations, and address the aforementioned limitations by introducing a novel dataset that covers 10 languages, 15 NER categories and 2 textual genres.We also introduce a manually-annotated test set, and extensively evaluate the quality of our novel dataset on both this new test set and standard benchmarks for NER.In addition, in our dataset, we include: i) disambiguation information to enable the development of multilingual entity linking systems, and ii) image URLs to encourage the creation of multimodal systems. We release our dataset at https://github.com/Babelscape/multinerd

Archivio della ricerca- Università di Roma La Sapienza

A Unified multilingual semantic representation of concepts

Author: CAMACHO COLLADOS Jose'
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation, called MUFFIN , which not only enables accurate representation of word senses in different languages, but also provides multiple advantages over existing approaches. MUFFIN represents a given concept in a unified semantic space irrespective of the language of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two different evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art performance on several standard datasets

CiteSeerX

Crossref

Online Research @ Cardiff

Archivio della ricerca- Università di Roma La Sapienza

NASARI: a novel approach to a Semantically-Aware Representation of items

Author: CAMACHO COLLADOS Jose'
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue
Publication date: 01/01/2015
Field of study

The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/

CiteSeerX

Crossref

Online Research @ Cardiff

Archivio della ricerca- Università di Roma La Sapienza

Embeddings for word sense disambiguation: an evaluation study

Author: Iacobacci IGNACIO JAVIER
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability to capture semantic information from massive amounts of textual content. As a result, many tasks in Natural Language Processing have tried to take advantage of the potential of these distributional models. In this work, we study how word embeddings can be used in Word Sense Disambiguation, one of the oldest tasks in Natural Language Processing and Artificial Intelligence. We propose different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and perform a deep analysis of how different parameters affect performance. We show how a WSD system that makes use of word embeddings alone, if designed properly, can provide significant performance improvement over a state-of-the-art WSD system that incorporates several standard WSD features

Archivio della ricerca- Università di Roma La Sapienza

SensEmbed: Learning sense embeddings for word and relational similarity

Author: IACOBACCI IGNACIO JAVIER
NAVIGLI ROBERTO
PILEHVAR MOHAMMED TAHER
Publication venue
Publication date: 01/01/2015
Field of study

Word embeddings have recently gained considerable popularity for modeling words in different Natural Language Processing (NLP) tasks including semantic similarity measurement. However, notwithstanding their success, word embeddings are by their very nature unable to capture polysemy, as different meanings of a word are conflated into a single representation. In addition, their learning process usually relies on massive corpora only, preventing them from taking advantage of structured knowledge. We address both issues by proposing a multifaceted approach that transforms word embeddings to the sense level and leverages knowledge from a large semantic network for effective semantic similarity measurement. We evaluate our approach on word similarity and relational similarity frameworks, reporting state-of-the-art performance on multiple datasets

CiteSeerX

Archivio della ricerca- Università di Roma La Sapienza

Conditions, constraints and contracts: on the use of annotations for policy modeling.

Author: Bottoni Paolo Gaspare
Navigli Roberto
PARISI PRESICCE Francesco
Publication venue: Technical University of Aachen
Publication date: 01/01/2015
Field of study

Organisational policies express constraints on generation and processing of resources. However, application domains rely on transformation processes, which are in principle orthogonal to policy specifications and domain rules and policies may evolve in a non-synchronised way. In previous papers, we have proposed annotations as a flexible way to model aspects of some policy, and showed how they could be used to impose constraints on domain configurations, how to derive application conditions on transformations, and how to annotate complex patterns. We extend the approach by: allowing domain model elements to be annotated with collections of elements, which can be collectively applied to individual resources or collections thereof; proposing an original construction to solve the problem of annotations remaining orphan , when annotated resources are consumed; introducing a notion of contract, by which a policy imposes additional pre-conditions and post-conditions on rules for deriving new resources. We discuss a concrete case study of linguistic resources, annotated with information on the licenses under which they can be used. The annotation framework allows forms of reasoning such as identifying conflicts among licenses, enforcing the presence of licenses, or ruling out some modifications of a licence configuration

Archivio della ricerca- Università di Roma La Sapienza

Electronic Communications of the EASST (European Association of Software Science and Technology)